137 research outputs found

    Concentration inequalities under sub-Gaussian and sub-exponential conditions

    Get PDF
    We prove analogues of the popular bounded difference inequality (also called McDiarmid’s inequality) for functions of independent random variables under sub-Gaussian and sub-exponential conditions. Applied to vector-valued concentration and the method of Rademacher complexities these inequalities allow an easy extension of uniform convergence results for PCA and linear regression to the case of potentially unbounded input- and output variables

    Multi-Task and Meta-Learning with Sparse Linear Bandits

    Get PDF
    Motivated by recent developments on meta-learning with linear contextual bandit tasks, we study the benefit of feature learning in both the multi-task and meta-learning settings. We focus on the case that the task weight vectors are jointly sparse, i.e. they share the same small set of predictive features. Starting from previous work on standard linear regression with the group-lasso estimator we provide novel oracle-inequalities for this estimator when samples are collected by a bandit policy. Subsequently, building on a recent lasso-bandit policy, we investigate its group-lasso variant and analyze its regret bound. We specialize the proposed policy to the multi-task and meta-learning settings, demonstrating its theoretical advantage. We also point out a deficiency in the state-of-the-art lower bound and observe that our method has a smaller upper bound. Preliminary experiments confirm the effectiveness of our approach in practice

    The Advantage of Conditional Meta-Learning for Biased Regularization and Fine-Tuning

    Get PDF
    Biased regularization and fine tuning are two recent meta-learning approaches. They have been shown to be effective to tackle distributions of tasks, in which the tasks’ target vectors are all close to a common meta-parameter vector. However, these methods may perform poorly on heterogeneous environments of tasks, where the complexity of the tasks’ distribution cannot be captured by a single meta- parameter vector. We address this limitation by conditional meta-learning, inferring a conditioning function mapping task’s side information into a meta-parameter vector that is appropriate for that task at hand. We characterize properties of the environment under which the conditional approach brings a substantial advantage over standard meta-learning and we highlight examples of environments, such as those with multiple clusters, satisfying these properties. We then propose a convex meta-algorithm providing a comparable advantage also in practice. Numerical experiments confirm our theoretical findings

    Fitting Spectral Decay with the k-Support Norm

    Get PDF
    The spectral kk-support norm enjoys good estimation properties in low rank matrix learning problems, empirically outperforming the trace norm. Its unit ball is the convex hull of rank kk matrices with unit Frobenius norm. In this paper we generalize the norm to the spectral (k,p)(k,p)-support norm, whose additional parameter pp can be used to tailor the norm to the decay of the spectrum of the underlying model. We characterize the unit ball and we explicitly compute the norm. We further provide a conditional gradient method to solve regularization problems with the norm, and we derive an efficient algorithm to compute the Euclidean projection on the unit ball in the case p=∞p=∞. In numerical experiments, we show that allowing pp to vary significantly improves performance over the spectral kk-support norm on various matrix completion benchmarks, and better captures the spectral decay of the underlying model

    Implicit Kernel Meta-Learning Using Kernel Integral Forms

    Get PDF
    Meta-learning algorithms have made significant progress in the context of meta-learning for image classification but less attention has been given to the regression setting. In this paper we propose to learn the probability distribution representing a random feature kernel that we wish to use within kernel ridge regression (KRR). We introduce two instances of this meta-learning framework, learning a neural network pushforward for a translation-invariant kernel and an affine pushforward for a neural network random feature kernel, both mapping from a Gaussian latent distribution. We learn the parameters of the pushforward by minimizing a meta-loss associated to the KRR objective. Since the resulting kernel does not admit an analytical form, we adopt a random feature sampling approach to approximate it. We call the resulting method Implicit Kernel Meta-Learning (IKML). We derive a meta-learning bound for IKML, which shows the role played by the number of tasks T, the task sample size n, and the number of random features M. In particular the bound implies that M can be the chosen independently of T and only mildly dependent on n. We introduce one synthetic and two real-world meta-learning regression benchmark datasets. Experiments on these datasets show that IKML performs best or close to best when compared against competitive meta-learning methods

    Distributed Zero-Order Optimization under Adversarial Noise

    Get PDF
    We study the problem of distributed zero-order optimization for a class of strongly convex functions. They are formed by the average of local objectives, associated to different nodes in a prescribed network. We propose a distributed zero-order projected gradient descent algorithm to solve the problem. Exchange of information within the network is permitted only between neighbouring nodes. An important feature of our procedure is that it can query only function values, subject to a general noise model, that does not require zero mean or independent errors. We derive upper bounds for the average cumulative regret and optimization error of the algorithm which highlight the role played by a network connectivity parameter, the number of variables, the noise level, the strong convexity parameter, and smoothness properties of the local objectives. The bounds indicate some key improvements of our method over the state-of-the-art, both in the distributed and standard zero-order optimization settings. We also comment on lower bounds and observe that the dependency over certain function parameters in the bound is nearly optimal

    Mistake Bounds for Binary Matrix Completion

    Get PDF
    We study the problem of completing a binary matrix in an online learning setting.On each trial we predict a matrix entry and then receive the true entry. We propose a Matrix Exponentiated Gradient algorithm [1] to solve this problem. We provide a mistake bound for the algorithm, which scales with the margin complexity [2, 3] of the underlying matrix. The bound suggests an interpretation where each row of the matrix is a prediction task over a finite set of objects, the columns. Using this we show that the algorithm makes a number of mistakes which is comparable up to a logarithmic factor to the number of mistakes made by the Kernel Perceptron with an optimal kernel in hindsight. We discuss applications of the algorithm to predicting as well as the best biclustering and to the problem of predicting the labeling of a graph without knowing the graph in advance

    The Role of Global Labels in Few-Shot Classification and How to Infer Them

    Get PDF
    Few-shot learning is a central problem in meta-learning, where learners must quickly adapt to new tasks given limited training data. Recently, feature pre-training has become a ubiquitous component in state-of-the-art meta-learning methods and is shown to provide significant performance improvement. However, there is limited theoretical understanding of the connection between pre-training and meta-learning. Further, pre-training requires global labels shared across tasks, which may be unavailable in practice. In this paper, we show why exploiting pre-training is theoretically advantageous for meta-learning, and in particular the critical role of global labels. This motivates us to propose Meta Label Learning (MeLa), a novel meta-learning framework that automatically infers global labels to obtains robust few-shot models. Empirically, we demonstrate that MeLa is competitive with existing methods and provide extensive ablation experiments to highlight its key properties
    • …